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Abstract 


This paper proposes a class of estimators for population correlation coefficient 
when information about the population mean and population variance of one of the 
variables is not available but information about these parameters of another variable 
(auxiliary) is available, in two phase sampling and analyzes its properties. Optimum 
estimator in the class is identified with its variance formula. The estimators of the class 
involve unknown constants whose optimum values depend on unknown population 
parameters.Following (Singh, 1982) and (Srivastava and Jhajj, 1983), it has been shown 
that when these population parameters are replaced by their consistent estimates the 


resulting class of estimators has the same asymptotic variance as that of optimum 


estimator. An empirical study is carried out to demonstrate the performance of the 


constructed estimators. 
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1. Introduction 
Consider a finite population U= {1,2,..,1,..N}. Let y and x be the study and auxiliary 
variables taking values y; and x; respectively for the ith unit. The correlation coefficient 


between y and x is defined by 


Py, = Syx (SySx) (1.1) 
where 
ea 20S =F O82 -(n -1)">( a) ch 0a Gs ors 
i=l i=] i=l 
N N 


Based on a simple random sample of size n drawn without replacement, 


(xi, yi), 1 = 1,2,...,n; the usual estimator of ,, is the corresponding sample correlation 


coefficient : 


T= Syx /(SxSy) (1.2) 


n n 


where S yy =(n i) (y; ~ ~\(x, ae) s? =(n -1)" (x; ~x) 


i=l i=l 


sy =(n-I) ily - =m Diy Fan Diy. 
i=l i=l i=l 

The problem of estimating p,, has been earlier taken up by various authors including 

(Koop, 1970), (Gupta et. al., 1978, 1979), (Wakimoto, 1971), (Gupta and Singh, 1989), 

(Rana, 1989) and (Singh et. al., 1996) in different situations. (Srivastava and Jhajj, 1986) 


have further considered the problem of estimating p,, in the situations where the 
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information on auxiliary variable x for all units in the population is available. In such 
situations, they have suggested a class of estimators for p,, which utilizes the known 
values of the population mean YX and the population variance S? of the auxiliary variable 


x. 


In this paper, using two — phase sampling mechanism, a class of estimators for 


P,, in the presence of the available knowledge (Z and S ?) on second auxiliary variable z 


is considered, when the population mean ¥ and population variance S? of the main 


auxiliary variable x are not known. 


2. The Suggested Class of Estimators 


In many situations of practical importance, it may happen that no information is 
available on the population mean Y and population variance S?, we seek to estimate the 
population correlation coefficient p,, from a sample ‘s’ obtained through a two-phase 
selection. Allowing simple random sampling without replacement scheme in each phase, 
the two- phase sampling scheme will be as follows: 


(i) The first phase sample s* (s° cU ) of fixed size n,, is drawn to observe only x in 
order to furnish a good estimates of ¥ and S?. 


(i1) Givens’, the second- phase sample s (scs*) of fixed size n is drawn to 


observe y only. 
Let 


¥ =(I/n) ¥)x; F= (Yn) diy; 5¥" =(/m) Dix, ose =(n- I , - 3, 


ies ies ies* ies 


s? =(n, say SG ~x") : 


ies* 
We write wu =x/x* ,v=s/s” . Whatever be the sample chosen let (u,v) assume values in 


a bounded closed convex subset, R, of the two-dimensional real space containing the 
point (1,1). Let A (u, v) be a function of u and v such that 
h(,1)=1 (2.1) 


and such that it satisfies the following conditions: 


25 


1. The function / (u,v) is continuous and bounded in R. 
2. The first and second partial derivatives of h(u,v) exist and are continuous and 
bounded in R. 
Now one may consider the class of estimators of p,, defined by 
Pia =1 U,V) (2.2) 


which is double sampling version of the class of estimators 


Suggested by (Srivastava and Jhajj, 1986), where u* =x/X, v* =s2/S? and (x, s?) are 
known. 
Sometimes even if the population mean XY and population variance S? of x are 


not known, information on a cheaply ascertainable variable z, closely related to x but 
compared to x remotely related to y, is available on all units of the population. This type 
of situation has been briefly discussed by, among others, (Chand, 1975), (Kiregyera, 
1980, 1984). 


Following (Chand, 1975) one may define a chain ratio- type estimator for Py as 


CIE cs 


where the population mean Z and population variance S? of second auxiliary variable z are 


known, and 


z" =(i/n,) 02; oe =(n, -1' 3G, -2") 


ies* ies* 
. . . * 
are the sample mean and sample variance of z based on preliminary large sample s_ of 


size n; (>n). 


The estimator /,, in (2.3) may be generalized as 


—\@ 2 \@2 (Hx \% «2 \% 

Xx S Z S 
ae x z z 2A 
Pra (=) (= [= [=| (2.4) 
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where a,'s (i=1,2,3,4) are suitably chosen constants. 

Many other generalization of ,, 1s possible. We have, therefore, considered a 
more general class of ,, from which a number of estimators can be generated. 

The proposed generalized estimators for population correlation coefficient p,, is 


defined by 
Pig =U t(u,v, Ww, a) (2.5) 


where w=2°/Z ,a=s7"/S? and t(u,v,w,a) is a function of (u,v,w,a) such that 
t(1,1,1,D=1 (2.6) 
Satisfying the following conditions: 
(i) Whatever be the samples (s and s) chosen, let (u,v,w,a) assume values in a closed 
convex subset S, of the four dimensional real space containing the point P=(1,1,1,1). 
(11) In S, the function ¢(u,v,w,a) is continuous and bounded. 
(iii) The first and second order partial derivatives of t(u,v,w, a) exist and are 
continuous and bounded in S 
To find the bias and variance of £,, we write 
sy =S;(1+e,),¥=X(1+e,), x" =X(+e,),s; = Si (1+ ey) 


s? =S?(1+e5),7" =Z(1+e%),s? =S2(1+e%),5,, =S,(1+e,) 


px 


such that E(ey) =E (e))=E(e2)=E(es)=0 and E(e; )=0 WV i=1,2,3,4, 


and ignoring the finite population correction terms, we write to the first degree of 


approximation 
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i 
)=(6 : 202 ~ cee ee ee 

)= Cy [nes B (e, €))=SyyC, /n, Ele, €3 )= SoC. [m5 

*)= Py € C, /n, Ele, e1 )=So2C,/n, Ele, €5)=(6C/ Py. )/n, 


Ele? és |= Oop C, [tisk Ee? 2 )= Oyo. /n; Ele? et )=p,.C,C. /n ; 
Ele? 6.) =O Cf Nick Ele? €s)=(5,20C./Pyn)/n, ; 

El, €3 )=(Soa —L/n, .£ Ele, e1)=5y,C. /n, Ele, e* |= (Sup -1)/n,, 
E(e, e =(6 nylDg)=l Jn, E(e; et )= 5,0, fn, 


=(6oy -1 1)/n, Ele €;)= {6,35 /P,.)—1y/m,, 
= 63C, /n, Ele} e;)= (5,1:C./Py.)/n,. 


Ele e,)= (6 fi x)= Wn, 


where 


(y, = ‘aa (x, md (c, Al , (p,9,m)_ being 


‘Mz 


Oe = FE pgm |(ue? fee he ), Hngm = (Ny, 


I 
un 


non-negative integers. 


To find the expectation and variance of 6,,, we expand t(u,v,w,a) about the point 


P= (1,1,1,1) in a second- order Taylor’s series, express this value and the value of r in 
terms of e’s . Expanding in powers of e’s and retaining terms up to second power, we 


have 


E( Pu = Py» + (7) (2.7) 


which shows that the bias of A,, is of the order n''and so up to order n’ , mean square 


error and the variance of p,, are same. 
Expanding (Bui ps Re retaining terms up to second power in e’s, taking 
expectation and using the above expected values, we obtain the variance of /,, to the 


first degree of approximation, as 
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Var (Pq) =Var(r) + Qs /n)[Coty (P) + (Sou — Dtz (P) — At, (P) — Bt, (P) + 25q39C,t, (Pt, (P)] 

— (p21 m,IC23 (P) + Soap ~ Dt3 (P) — C283 (P) — (Buy — DG (P) ~ At, (P) - 

Bt, (P) + Dt;(P) + Ft4(P) + 2639C yt) (Pts (P) — 2693C -t3 (P)t (P)] 

(2.8) 

where ¢/(P), t2(P), t3(P)and t4(P) respectively denote the first partial derivatives of 
t(u,v,w,a) white respect to u,v,w and a respectively at the point P= (1,1,1,1), 
Var(r)= (Pox /n)L(5599 / p?.)+ (1/4)(So40 + F400 + 28599) — (C6130 + 8310) / Pyxt] (2-9) 
A= {0419 + 6039 — 2(0120 / Pyx JE Cx, B= {0.209 + S049 — 2(6130 / Pyx 
D = (69, + 0091 — 26111 / Pyx SCL» = {0202 + Gor — 2(6 112 / Py) 


Any parametric function ¢(u,v,w,a) satisfying (2.6) and the conditions (1) and (2) can 


generate an estimator of the class(2.5). 


The variance of 6,, at (2.6) is minimized for 


t, (P) = [A(Sou9 a 1) x BoC, | 


= a(say), 
a ee ala 
BC? = AdpagC 
(y= EE Ma) af = zl = B(say), 
2C;, O40 = 6030 =1 (2 10) 
D(5o04 —1)— F6y03C- 
t; (P) = [ ( ~ ) ; 003 ee 
2C; Oooa = O30 = 
C?F —Dby,C 
t,(P) iPr) 5 a = 0(say), 
2Cz 6004 — S003 ~ | 
Thus the resulting (minimum) variance of 6,, is given by 
: : 11) A® | (AIC, )S 50 — BY 
min .Var =Var(r ne + = 
(Pia) (r) | me? Ge WS eB el ] 
(2.11) 


D* + U(D/C, 603 =i 
AC? — A(So04 — S603 — D) 


—(p%,/m) 
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It is observed from (2.11) that if optimum values of the parameters given by 


(2.10) are used, the variance of the estimator ,, is always less than that of r as the last 


two terms on the right hand sides of (2.11) are non-negative. 


Two simple functions ¢(u,v,w,a) satisfying the required conditions are 


t(u,v,w,a)= |+a,(u-1l)+a,(v—-l+a;(w-1)+a,(a-1) 
t(u,v, w,a) =u" Vv? w% a 
and for both these functions t)(P) =a@,, tz (P) =a@,, t3 (P) =a, and ts (P) =a@,. Thus one 
should use optimum values of @,,a@,, @,and a,in /,, to get the minimum variance. It is 
to be noted that the estimated ,, attained the minimum variance only when the optimum 
values of the constants @, (i=1,2,3,4), which are functions of unknown population 
parameters, are known. To use such estimators in practice, one has to use some guessed 
values of population parameters obtained either through past experience or through a 
pilot sample survey. It may be further noted that even if the values of the constants used 
in the estimator are not exactly equal to their optimum values as given by (2.8) but are 
close enough, the resulting estimator will be better than the conventional estimator, as has 
been illustrated by (Das and Tripathi, 1978, Sec.3). 

If no information on second auxiliary variable z is used, then the estimator /,, 


reduces to 6,, defined in (2.2). Taking z = | in (2.8), we get the variance of p,, to the 


first degree of approximation, as 


a 1 1 
Var Py) =Vanr) + [2 - 1) [C2A? (LI) + (Syyo — AZ CL) — Ah, (LI) — Bly (LI) + 28a, Ay (LV LD) 
1 
(2.12) 
which is minimized for 
pe jon 2 ery 
h,(1,1) _ [A(O 49 1) Boo C, | : ho(1,1) = (BC, Ad sigC,.) (2.13) 


20; Gu — SoD) 201 Ou Oi =) 


Thus the minimum variance of /,, is given by 
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A? + {(A/C, 6030 = By? 


] (2.14) 
AC! — A549 — 5630 1) 


: 2 1 1 
min. Var( yy )=Var(t) (— ——) eel 
1 


It follows from (2.11) and (2.14) that 


D* | (D/C) 03 — FY 


] (2.15) 
AC? — A(Sqo4 — O03 —)) 


min. Var( ?,, )-min.Var( p,,, )= (02, / n)] [ 


which is always positive. Thus the proposed estimator (,, is always better than /,,. 


3. A Wider Class of Estimators 


In this section we consider a class of estimators of ,,. wider than ( 2.5) given by 


Pod =8(1,U,V,W,d) (3.1) 


where 2(7,u,v,w,a) 1s a function of r,u,v, w,a and such that 


at ostaltat)= a, and | 20] =I 
(p,1L1L1) 


r 


Proceeding as in section 2, it can easily be shown, to the first order of approximation, that 
the minimum variance of (,, is same as that of 6,; given in (2.11). 

It is to be noted that the difference-type estimator 
ta r+ a, (u-l) + a, (v-1) + a, (w-1) + a, (a-1), is a particular case of x , but it is 


not the member of /,, in (2.5). 


4. Optimum Values and Their Estimates 


The optimum values t)(P) = a, t2o(P) = B , t3(P) = y and t,(P)=6 given at 
(2.10) involves unknown population parameters. When these optimum values are 
substituted in (2.5) , it no longer remains an estimator since it involves unknown 


(a,B8,7v,6), which are functions of unknown population parameters, say,, 5,9, (D, G.N= 


0,1,2,3,4), GC, C,and p,, itself. Hence it is advisable to replace them by their consistent 


estimates from sample values. Let (a, B, y,6 ) be consistent estimators of t)(P),t2(P), 


t3(P) and t4(P) respectively, where 
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= [A(S ao = 1) zi Bo jnC sl 


BC? as ac. | 


i, (P)=a@=—_"* —— ; 2, (P) = b= t= 
2c. (Oou0 a Oo im 1) 2C: (Ocuo = O06 a7, 1) 
Se gi ADO ga FOnC dace. .@ |\C2R Done 
i,(P)=7 = i,(P)=6 = ca oe | 
2C; (Oooa 7 003 = 1) 2C; (Sova = 6003 "> 1) 
(4.1) 
with 


A= OF + dose 2b IryIC, , B= oe + Sou —2(6 50 /r)], 


An 


= [059 + don, -2(5,,, alee F =e + dopo = 26,5 /r)], 


a pl2 xq/2 xm/2 


D 
C, rs; S. [Xx > Cc, = Ss, /Z, Oooh = yam as FHo20 Ho02 ) 


Zain) 738 =(n-1)" DG, —x)? ,¥=(1/n)>°x,, 
i=l i=l i=] 


ere Rs NOR =@-D" 20, -y)’,s? =D, aoa ae 


A 


We then replace (a, f,7,6 ) by (4, f,7,6) in the optimum £,, resulting in the estimator 


Pia Say, Which is defined by 
p,, =rt’ (u,v,w,a,a, B,7,6), (4.2) 
where the function t*(U), U= (u,v,w,a,a@, B, 7,6 ) is derived from the the function 
t(u,v,w,a) given at (2.5) by replacing the unknown constants involved in it by the 
consistent estimates of optimum values. The condition (2.6) will then imply that 
t*(P*)=1 
Pe= (Lh af 3750 ) 


(4.3) 


where 


We further assume that 
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ot* ot *(U 
py) na, nip) =p 
ypy=O)| ay, (r=) 54 
3 a*U)]_ wy t*U)] | 
A aro as tis =O oa ery ae i oO 
t, (P*) = oe) =O ty (P*) = ao) =o 
U=P U=P" 


Expanding ¢t*(U) about P*= (1,1,1,1, a, 8,7,6), in Taylor’s series, we have 


Ba =r (P*)+(u-Dty (PP) + Dey (P") + (wes (P") + (a- Dt (P’) + @- ats (P’) 
+(B- B)tg(P)+(%-y)t (P")+ (5 —5)t;(P") + second order terms] 
(4.5) 


Using (4.4) in (4.5) we have 
Py =rl+(u-Na t+ (v-)B +(w—-Dy +(a—1)6 + second order terms] (4.6) 


Expressing (4.6) in term of e’s squaring and retaining terms of e’s up to second degree, 


we have 
na 1 * * * * 
(Pra = Pyx)° = Pils Qes —&) —@,) + ae, —e,) + Ble, —e)) + 723 +6e,]” (4.7) 


Taking expectation of both sides in (4.7), we get the variance of ;, to the first degree of 


approximation, as 
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{(A/C,)8o30 — BY | 
A(6 49 _ Cosa ~ 1) 

_ (BIC) 8003 ~ FY | 

4c? A604 - Ono ~ 1) 


Ver paver) ep? ae 
Pra) = n n, Pryx 4c? 
(4.8) 


+ (Dy /M, 12: 


which is same as (2.11), we thus have established the following result. 


Result 4.1: If optimum values of constants in (2.10) are replaced by their consistent 
estimators and conditions (4.3) and (4.4) hold good, the resulting estimator ,, has the 


same variance to the first degree of approximation, as that of optimum /,, . 


Remark 4.1: It may be easily examined that some special cases: 


tlt atu =I) + 7(w- 1)} 
a Bv-1)-d(a-D} 


A é 
() in =rutv?wra?, Gi) Pin = 


(iii) 6.5 =r[1+@(u—-1) + B(u-1) + P(w-1) + d(a-D] 


(iv) Py =l- Au -1) - Bu-l)-Pw-l)-6(a-hy' 


of p,, satisfy the conditions (4.3) and (4.4) and attain the variance (4.8). 


Remark 4.2: The efficiencies of the estimators discussed in this paper can be compared 


for fixed cost, following the procedure given in (Sukhatme et. al., 1984). 


5. Empirical Study 

To illustrate the performance of various estimators of population 
correlation coefficient, we consider the data given in (Murthy, 1967, p. 226]. The 
variates are: 
y=output, x=Number of Workers, z =Fixed Capital 
N=80, n=10, n;-25, 
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X = 283.875, Y =5182.638, Z =1126, C, = 0.9430, C, = 0.3520, C, = 0.7460, 
Syp3 =1.030, Syoq = 2.8664, Sy), = 1.1859, Sy), = 3.1522, Soyo =1.295, Sosy = 3-65; 
Sy) = 0.7491, Sy) = 0.9145, 6,1, = 0.8234, Oy) = 2.8525, 


57 = 2.5454, 5549 =0.5475, 5499 =2.3377, Sy, = 0.4546, Sy) = 2.2208, 54, = 0.1301, 
Sy) = 2.2667, Py. = 0.9136, Py, = 0.9859, p,, = 0.9413. 


The percent relative efficiencies (PREs) of (,,,0j¢,0,q With respect to conventional 


estimator r have been computed and compiled in Table 5.1. 


Table 5.1: The PRE’s of different estimators of :,, 


Estimator r Pra Pa (OF Pu) 


PRE(.,r) 100 129.147 305.441 


Table 5.1 clearly shows that the proposed estimator ,,(or 6,, ) is more efficient 


than rand /,,. 
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